{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "ad66c9aa",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.embeddings.openai import OpenAIEmbeddings\n",
    "from langchain.vectorstores import FAISS\n",
    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
    "from langchain import OpenAI\n",
    "from langchain.chains import RetrievalQA\n",
    "from langchain.document_loaders import DirectoryLoader\n",
    "import magic\n",
    "import os\n",
    "import nltk\n",
    "\n",
    "openai_api_key = os.getenv(\"OPENAI_API_KEY\", \"YourAPIKey\")\n",
    "\n",
    "# nltk.download('averaged_perceptron_tagger')\n",
    "\n",
    "# pip install unstructured\n",
    "# Other dependencies to install https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/unstructured_file.html\n",
    "# pip install python-magic-bin\n",
    "# pip install chromadb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "e8a28a08",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get your loader ready\n",
    "loader = DirectoryLoader('../data/PaulGrahamEssaySmall/', glob='**/*.txt')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6a9740d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load up your text into documents\n",
    "documents = loader.load()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "3153f864",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get your text splitter ready\n",
    "text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a792c6fb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Split your documents into texts\n",
    "texts = text_splitter.split_documents(documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "d2cad0de",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Turn your texts into embeddings\n",
    "embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "734ed265",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get your docsearch ready\n",
    "docsearch = FAISS.from_documents(texts, embeddings)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "66826924",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load up your LLM\n",
    "llm = OpenAI(openai_api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "817a0ece",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create your Retriever\n",
    "qa = RetrievalQA.from_chain_type(llm=llm, chain_type=\"stuff\", retriever=docsearch.as_retriever())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "20b81063",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' McCarthy discovered that a programming language could be constructed from a handful of simple operators and a notation for functions, using a data structure called a list for both code and data.'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Run a query\n",
    "query = \"What did McCarthy discover?\"\n",
    "qa.run(query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f22231c5",
   "metadata": {},
   "source": [
    "### Sources"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "694343cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "qa = RetrievalQA.from_chain_type(llm=llm,\n",
    "                                chain_type=\"stuff\",\n",
    "                                retriever=docsearch.as_retriever(),\n",
    "                                return_source_documents=True)\n",
    "query = \"What did McCarthy discover?\"\n",
    "result = qa({\"query\": query})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "bec53323",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "' McCarthy discovered a way to build a whole programming language using a handful of simple operators and a notation for functions. He called this language Lisp, for \"List Processing,\" because one of his key ideas was to use a simple data structure called a list for both code and data.'"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result['result']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "32246ae3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(page_content='May 2001\\n\\n(I wrote this article to help myself understand exactly\\n\\nwhat McCarthy discovered.  You don\\'t need to know this stuff\\n\\nto program in Lisp, but it should be helpful to\\n\\nanyone who wants to\\n\\nunderstand the essence of Lisp \\x97 both in the sense of its\\n\\norigins and its semantic core.  The fact that it has such a core\\n\\nis one of Lisp\\'s distinguishing features, and the reason why,\\n\\nunlike other languages, Lisp has dialects.)In 1960, John\\n\\nMcCarthy published a remarkable paper in\\n\\nwhich he did for programming something like what Euclid did for\\n\\ngeometry. He showed how, given a handful of simple\\n\\noperators and a notation for functions, you can\\n\\nbuild a whole programming language.\\n\\nHe called this language Lisp, for \"List Processing,\"\\n\\nbecause one of his key ideas was to use a simple\\n\\ndata structure called a list for both\\n\\ncode and data.It\\'s worth understanding what McCarthy discovered, not\\n\\njust as a landmark in the history of computers, but as', metadata={'source': '../data/PaulGrahamEssaySmall/rootsoflisp.txt'}),\n",
       " Document(page_content=\"itself.  To understand what McCarthy meant by this,\\n\\nwe're going to retrace his steps, with his mathematical\\n\\nnotation translated into running Common Lisp code.\", metadata={'source': '../data/PaulGrahamEssaySmall/rootsoflisp.txt'}),\n",
       " Document(page_content=\"a model for what programming is tending to become in\\n\\nour own time.  It seems to me that there have been\\n\\ntwo really clean, consistent models of programming so\\n\\nfar: the C model and the Lisp model.\\n\\nThese two seem points of high ground, with swampy lowlands\\n\\nbetween them.  As computers have grown more powerful,\\n\\nthe new languages being developed have been moving\\n\\nsteadily toward the Lisp model.  A popular recipe\\n\\nfor new programming languages in the past 20 years\\n\\nhas been to take the C model of computing and add to\\n\\nit, piecemeal, parts taken from the Lisp model,\\n\\nlike runtime typing and garbage collection.In this article I'm going to try to explain in the\\n\\nsimplest possible terms what McCarthy discovered.\\n\\nThe point is not just to learn about an interesting\\n\\ntheoretical result someone figured out forty years ago,\\n\\nbut to show where languages are heading.\\n\\nThe unusual thing about Lisp \\x97 in fact, the defining\\n\\nquality of Lisp \\x97 is that it can be written in\", metadata={'source': '../data/PaulGrahamEssaySmall/rootsoflisp.txt'}),\n",
       " Document(page_content=\"January 2023(Someone fed my essays into GPT to make something that could answer\\n\\nquestions based on them, then asked it where good ideas come from.  The\\n\\nanswer was ok, but not what I would have said. This is what I would have said.)The way to get new ideas is to notice anomalies: what seems strange,\\n\\nor missing, or broken? You can see anomalies in everyday life (much\\n\\nof standup comedy is based on this), but the best place to look for\\n\\nthem is at the frontiers of knowledge.Knowledge grows fractally.\\n\\nFrom a distance its edges look smooth, but when you learn enough\\n\\nto get close to one, you'll notice it's full of gaps. These gaps\\n\\nwill seem obvious; it will seem inexplicable that no one has tried\\n\\nx or wondered about y. In the best case, exploring such gaps yields\\n\\nwhole new fractal buds.\", metadata={'source': '../data/PaulGrahamEssaySmall/getideas.txt'})]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result['source_documents']"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
